Two different versions of the count matrix exist: the one from GEO and the one created by us personally from cell ranger
Decided to go with cellranger matrix for analysis (probably no big difference in the end)
## An object of class Seurat
## 19764 features across 4004 samples within 1 assay
## Active assay: RNA (19764 features, 0 variable features)
## 1 layer present: counts
## An object of class Seurat
## 19764 features across 2313 samples within 1 assay
## Active assay: RNA (19764 features, 0 variable features)
## 1 layer present: counts
Now testing scran Other tested options: logNormalize, scTransform
## Finding variable features for layer counts
## When using repel, set xnudge and ynudge to 0 for optimal results
## Centering and scaling data matrix
## Find clusters
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2313
## Number of edges: 72170
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8372
## Number of communities: 9
## Elapsed time: 0 seconds
## 14:43:14 UMAP embedding parameters a = 0.9922 b = 1.112
## Found more than one class "dist" in cache; using the first, from namespace 'spam'
## Also defined by 'BiocGenerics'
## 14:43:14 Read 2313 rows and found 10 numeric columns
## 14:43:14 Using Annoy for neighbor search, n_neighbors = 30
## Found more than one class "dist" in cache; using the first, from namespace 'spam'
## Also defined by 'BiocGenerics'
## 14:43:14 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 14:43:14 Writing NN index file to temp file /var/folders/p4/zn4sp81s5gq35dzmr0nknsjc0000gn/T//RtmpPrcBwt/file37c6520f3bb2
## 14:43:14 Searching Annoy index using 1 thread, search_k = 3000
## 14:43:15 Annoy recall = 100%
## 14:43:15 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 14:43:16 Initializing from normalized Laplacian + noise (using RSpectra)
## 14:43:16 Commencing optimization for 500 epochs, with 87192 positive edges
## 14:43:19 Optimization finished
Specific MM marker genes: B2M, CCND1, Immunoglobulins (IGHM, IGKC)
Source: https://www.cancer.gov/about-cancer/diagnosis-staging/diagnosis/tumor-markers-list
Additional marker used in the Numbat paper for MM: MZB1
Additional marker used in the Casper paper for MM: SCD1, CD38
##
## control MM undefined
## control 257 4 101
## MM 0 1949 2
## For a (much!) faster implementation of the Wilcoxon Rank Sum Test,
## (default method for FindMarkers) please install the presto package
## --------------------------------------------
## install.packages('devtools')
## devtools::install_github('immunogenomics/presto')
## --------------------------------------------
## After installation of presto, Seurat will automatically use the more
## efficient implementation (no further action necessary).
## This message will be shown once per session
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## RPS4Y1 0.000000e+00 7.873292 0.928 0.025 0.000000e+00
## HLA-DPB1 0.000000e+00 8.670499 0.804 0.025 0.000000e+00
## PTPRC 0.000000e+00 7.679161 0.801 0.024 0.000000e+00
## HLA-DPA1 0.000000e+00 8.760905 0.779 0.021 0.000000e+00
## HLA-DRB1 3.669508e-286 8.624379 0.674 0.014 7.252415e-282
## CD99 1.507533e-284 8.114201 0.666 0.013 2.979488e-280
## [1] 3488
Check which cancer marker genes are within the DE genes:
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## B2M 7.780115e-08 -0.1736831 1.000 1.000 1.537662e-03
## IGHM 7.730073e-28 0.2699701 0.544 0.998 1.527772e-23
## MZB1 1.272309e-140 -2.9218435 0.304 0.964 2.514592e-136
## CD38 7.070348e-21 -0.4336177 0.204 0.569 1.397384e-16
Plot the top positive and negative markers:
print(table(seurat_obj$large_clusters,seurat_obj$copykat_pred))
##
## aneuploid diploid not.defined
## control 14 211 36
## MM 1943 1 5
print(table(seurat_obj$old_annot,seurat_obj$copykat_pred))
##
## aneuploid diploid not.defined
## control 13 208 36
## MM 1944 4 5
## undefined 0 0 0
print(table(seurat_obj$large_clusters,seurat_obj$scevan_pred))
##
## filtered normal tumor
## control 21 226 14
## MM 4 1 1944
print(table(seurat_obj$old_annot,seurat_obj$scevan_pred))
##
## filtered normal tumor
## control 21 223 13
## MM 4 4 1945
## undefined 0 0 0
print(table(seurat_obj$copykat_pred,seurat_obj$scevan_pred))
##
## filtered normal tumor
## aneuploid 0 0 1957
## diploid 0 212 0
## not.defined 25 15 1